Filtering the time sequences of spectral parameters for speech recognition,

نویسندگان

  • Climent Nadeu
  • Pau Pachès-Leal
  • Biing-Hwang Juang
چکیده

In automatic speech recognition, the signal is usually represented by a set of time sequences of spectral parameters Ž . TSSPs that model the temporal evolution of the spectral envelope frame-to-frame. Those sequences are then filtered either Ž . to make them more robust to environmental conditions or to compute differential parameters dynamic features which enhance discrimination. In this paper, we apply frequency analysis to TSSPs in order to provide an interpretation framework for the various types of parameter filters used so far. Thus, the analysis of the average long-term spectrum of the successfully filtered sequences reveals a combined effect of equalization and band selection that provides insights into TSSP filtering. Also, we show in the paper that, when supplementary differential parameters are not used, the recognition rate can be improved even for clean speech, just by properly filtering the TSSPs. To support this claim, a number of experimental results are presented, both using whole-word and subword based models. The empirically optimum filters attenuate the low-pass band and emphasize a higher band so that the peak of the average long-term spectrum of the output of these filters Ž . lies at around the average syllable rate of the employed database f3 Hz . q 1997 Elsevier Science B.V.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification of emotional speech using spectral pattern features

Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...

متن کامل

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract   Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

Speech Enhancement Using Gaussian Mixture Models, Explicit Bayesian Estimation and Wiener Filtering

Gaussian Mixture Models (GMMs) of power spectral densities of speech and noise are used with explicit Bayesian estimations in Wiener filtering of noisy speech. No assumption is made on the nature or stationarity of the noise. No voice activity detection (VAD) or any other means is employed to estimate the input SNR. The GMM mean vectors are used to form sets of over-determined system of equatio...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Correlation between Auditory Spectral Resolution and Speech Perception in Children with Cochlear Implants

Background: Variability in speech performance is a major concern for children with cochlear implants (CIs). Spectral resolution is an important acoustic component in speech perception. Considerable variability and limitations of spectral resolution in children with CIs may lead to individual differences in speech performance. The aim of this study was to assess the correlation between auditory ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Speech Communication

دوره 22  شماره 

صفحات  -

تاریخ انتشار 1997